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Abstract 

An important step of modeling spatially-referenced data is appropriately specifying the sec¬ 
ond order properties of the random held. A scientist developing a model for spatial data has a 
number of options regarding the nature of the dependence between observations. One of these 
options is deciding whether or not the dependence between observations depends on direction, 
or, in other words, whether or not the spatial covariance function is isotropic. Isotropy implies 
that spatial dependence is a function of only the distance and not the direction of the spa¬ 
tial separation between sampling locations. A researcher may use graphical techniques, such 
as directional sample semivariograms, to determine whether an assumption of isotropy holds. 
These graphical diagnostics can be dihicult to assess, subject to personal interpretation, and 
potentially misleading as they typically do not include a measure of uncertainty. In order to 
escape these issues, a hypothesis test of the assumption of isotropy may be more desirable. To 
avoid specification of the covariance function, a number of nonparametric tests of isotropy have 
been developed using both the spatial and spectral representations of random helds. Several 
of these nonparametric tests are implemented in the R package spTest, available on CRAN. We 
demonstrate how graphical techniques and the hypothesis tests programmed in spTest can be 
used in practice to assess isotropy properties. 


1 Introduction 


An important step in modeling a spatial process is choosing the form of the covariance function. 
The form of the covariance func tion will have an effect on kriging as well as parameter estimates 
and the associated uncertainty ( Cressid . 1993 . pg. 127-135). A common simplifying assumption 
about the spatial covariance function is that it is isotropic, that is, the dependence between sam¬ 
pling locations depends only on the distance between locations and not on their relative orientation. 
This assumption may not always be reasonable; for example, wind may lead to directional depen¬ 
dence in environmental monitoring data . Misspecificat ion of the second order properties can lead 
to misleading inference. ShermaiJ ( 2011 . pg. 87-90) and Guan et al.l (2004) summarize the effects of 
incorrectly specifying isotropy properties on kriging estimates through numerical examples. In order 
to choose an appropriate model, a statistician must first assess the nature of the spatial variation 
of his or her data. To check for anisotropy (directional dependence) in spatially-referenced data, a 
number of graphical techniques are available, s uch as directional sample variograms or rose diagrams 


(Matheron, 1961 : Isaaks and SrivastavaL 19891 pg. 149-154). Spatial statisticians may have intuition 


about the interpretation and reliability of these diagnostics, but a less experienced user may desire 
evaluation of assumptions via a hypothesis test. In this paper, we u se two real data exa mples to 
illustrat e how the no nparametric hypothesis tests available in the R (|R Core Teaml . 120141) package 
spTest ( Weller . 2015[l can be used to assess isotropy properties in spatially-referenced data. The 
examples also demonstrate how graphical techniques and hypothesis tests can be used in a comple¬ 
mentary role. The remainder of the paper is organized as follows: Section [5] establishes notation and 
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definitions; Section [3] describes the functionality of the spTest package; Section 0] demonstrates how 
to use the functions in spTest in conjunction with graphical techniques on two different data sets; 
Section [S] concludes the paper with a discussion. 


2 Notation 


2.1 Spatial statistics 


In Section 12.11 we briefly review key de finitions required fo r desc ribi ng and performing tests of 
i sotro py. For additional background, see Weller and HoetiiiS ( 2015 1 or Schabenberger and Gotwa^ 
(1200411 . Let {y(s) : s £ 2? C be a second order stationary random field (RF). Let {si,..., s„} C 
T) be the finite set of locations at which the random process is observed, providing the random 
vector Y = (Y(si),..., Y(s„))~''. The sampling locations may follow one of several spatial sampling 
designs, for example, gridded locations, randomly and uniformly distributed locations, a cluster 
design, or any other general design. Note tha t there is a distinct i on between a lattice process and a 
geost atistical process observed on a grid (^e.g.. lFnentes and Reichl . [20ir)HSchabenberger and Gotwavl 
l20n4 pg. 6-10), but we do not explore this distinction and will use the term grid throughout. 

For a spatial lag h = (/ii,/i 2 )^, the semivariogram function describes dependence between ob¬ 
servations, Y, at spatial locations separated by lag h and is defined as 


7(h) = ^Var(Y(s -k h) - Y(s)). 


The classical moment-based estimator of the semivariogram (IMatheronl Il962f) is 

1 


7 (h) = 


2|P(h)| 


^[Y(s)-Y(s + h)]2, 


( 2 . 1 ) 


( 2 . 2 ) 


where the sum is over Vih) = {s : s, s -(- h £ T>} and | 2 ?(h)| is the number of elements in T’(h). The 
set T’(h) is the set of sampling location pairs that are separated by spatial lag h. The covariance 
function is defined by 

( 7 (h) =Cov(Y(s),Y(s-kh)) (2.3) 

and is an alternative to the semivariogram for describing spatial dependence. When it exists, the 
covariance function, ( 7 (h), is related to the semivariogram by ( 7 (h) = ?iTO||v||-).oo7(v) — 7(h) if the 
limit exists. 

It is often of interest to infer the effect of covariates on the process, deduce dependence struc¬ 
ture, and/or predict Y with associated uncertainty at new locations. To achieve these goals, the 
practitioner must specify the distributional properties of the spatial process. A common assumption 
is that the Hnite-dimensional joint distribution is multivariate normal (MVN), in which case we call 
the RF a Gaussian random field (GRF). 

A common simplifying assumption about the spatial dependence structure is that it is isotropic. 

Definition 2.1.1. A second-order stationary spatial process is isotropic if the semivariogram, 7(h), 
[or equivalently, the covariance function C{h.)] of the spatial process depends on the lag vector h 
only through its Euclidean length, ||h||, i.e., 7(h) = 7g(||h||) for some function 7o(-) of a univariate 
argument. 


Isotropy implies that the dependence between any two observations depends only on the Euclidean 
distance between their sampling locations and not on their relative orientation. A spatial process 
that is not isotropic is called anisotropic. The methods in spTest are designed to assist in determin¬ 
ing whether or not the assumption of isotropy holds. Namely, the functions in spTest implement 
nonparametric hypothesis tests of isotropy, which avoid the choice of a parametric form for the 
covariance (semivariogram) function. 
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There are two important modifications to the estimator in Equation 12.21 that are pertinent to 
the methods described in this paper. First, for non-gridded sampling locations, the estimator needs 
to be modified to account for the fact that very few or no pairs of locations will be separated by a 
specific spatial lag, h. One solution to this challenge is to specify a distance tolerance, e, such that 
lags having length ||h|| ± e are included in estimating the semivariance at lag h. Second, directional 
sample semivariograms can be estimated by only using observations that are separated by spatial 
lags in a specific direction. For example, to investigate potential anisotropy, we can compare sample 
semivariograms between the horizontal and vertical directions. For non-gridded sampling locations, 
very few pairs of locations will lie at a specific distance and directional lag, so we need to allow for 
both a distance and a directional tolerance when estimating the semivariance. A common method 
for doing this is by using a product kernel smoother that smoothes over both the horizontal (hi) 
and vertical (h 2 ) components of the spatial lag h = (hi, h 2 )^. 

Spatial RFs and their second order properties can also be expressed in the spectral (or frequency) 
domain using Fourier transforms. The spectral representation of RFs and their second order prop¬ 
erties provides alternative methods for testing second order properties . For brevity we focus only on 
the methods in the spati al domain and refer the interested reader to Weller and Hoetin3 ( 2015 1 or 
Fuentes and Reich ( 2010l l. We note that that, in addi tion to the methods from the spatial domain, 
the spectral methods from iLu and ZimmermanI (1200511 are also implemented in spTest. 


2.2 Nonparametric tests of isotropy 

Lu (|l994ll and ILu and ZimmermM ( 200lll pioneered a popular approach to testing second-order 
properties when they used the asymptotic joint normality of the sample semivariogram computed 
at different spatial lags to e valuate symmetry and isot ropy properties. The subsequent works of 
Guan et al. ( 2004l200^ and Maitv and Sherm^ ( 2012l l built upon these id eas and are the pri mary 
methods programmed in s pTest. Here we give an overview of the tests in Guan et al.l ( 2004 1 and 
Maitv and Sherman ( 2012h . 

Under the null hypothesis that the RF is isotropic, it follows that the values of 7 (-) evaluated 
at any two spatial lags that have the same norm are equal, independent of the direction of the lags. 
For example, under the assumption of isotropy, 7 ((—6,0)) = 7 ((a/S, a/S )). To completely specify 
the null hypothesis of isotropy, theoretically, one would need to compare semivariogram values for 
an infinite set of lags. In practice, a small number of lags are specified. Then it is possible to test a 
hypothesis consisting of a set of linear contrasts of the form 


iJo : A7(.) = 0 


(2.4) 


as a p roxy for the null hypothesis of isotropy, where A is a full row rank matrix ( Lu and ZimmermanI . 
20011) . For example, a set of lags, denoted A, commonly used in practice on gridded sampling 
locations with unit spacing is 


A = {hi = (1,0), h2 = (0,1), ha = (l,l),h4 = (-1,1)}, 
and the corresponding A matrix under Hq : A 7 (A) = 0 is 


A = 


1-10 0 
0 0 1-1 


(2.5) 


( 2 . 6 ) 


In other words, by using Equations 12.51 and 12.61 under the hypothesis 12.41 we contrast the semi¬ 
variogram values at lags hi = ( 1 , 0 ) and h 2 = ( 0 , 1 ), and likewise, contrast semivariogram values at 
lags ha = ( 1 ,1) and h 4 = (—1,1). An important step in detecting anisotropy is the c hoice of lags, 
A, as the test results will only hold for the set of lags considered ( Guan et al.L 20041) . While this 
choice is somewhat subjective, there are several considerations for determining the set of lags. First, 
in terms of Euclidean distance, short lags should be chosen because estimates of the semivariance at 
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long lags will be less reliable than estimates at short lags because they are based on fewer pairs of 
observations and hence more variable. Second, pairs of orthogonal lags should be contrasted because, 
for an anisotropic process, the directions of strongest and weakest spatial correlation will typically 
be orthogonal. For other con s iderations and more detaile d gui delines regarding t he choice of lags, 
see Weller and Hoeting (|201^ . Lu and Zimmerman (l200l[), an d Guan et ^ ( 2004 ). 

The tests in Guan et al. (2004) and Maitv and Shermanl ( 2012h involve estimating either the 
semivariogram[2T]or covariogram l2.3l and evaluating the estimator at the set of chosen lags. Denoting 
the set of point estimates of the variogram/covariogram at the chosen lags as G„, the true values as 
G, and normalizing constant a„, a central result for all three methods is that 


an(G„ — G) 


MG7V(0,S), 


(2.7) 


under increasing domain asymptotics and mild moment and mixing conditions on the RF. The test 
statistic is a quadratic form 


TS = biiAGn )' (ASA ' )-'(AG„) 


( 2 . 8 ) 


where S is an estimate of the asymptotic variance-covariance matrix and is a normalizing constant. 
A p value can be obtained from the asymptotic distribution with degrees of freedom given by the 
row rank of A. Important differences in these works regard the distribution of sampling locations, 
shape of the sampling d omain, estimation of G and S, and assumptions on the dependence structure 
of the random field (see IWeller and Hoetind ( 2015 ) for more details). 


3 Nonparametric tests implemented in spTest 


The R package spTes t inclu des fu nctions for implemen t ing th e tests developed in iGnan et al.l (j2004l) . 
ILu and Zimmerman ( 2005 ). and Maitv and Sherman ( 2012 ). The primary functions in spTest for 
implementing these tests are Lu Test, MaityTest, GuanTestGrid, and GuanTestUnif . For example, 
the test from iGuan et al. (2004) for data observed at non-gridded, but uniformly distributed, sam¬ 
pling locations is implemented in the function GuanTestUnif, which takes the following arguments: 

GuanTestUnif(spdata, lagmat, A, df, h = 0.7, kernel = "norm", 
truncation = 1.5, xlims, ylims, grid.spacing = c(l, 1), 
window.dims = c(2, 2), subblock.h, sig.est.finite = T). 

There are several necessary inputs. The matrix spdata includes the coordinates of sampling locations 
and the corresponding data values. The spatial lags that will be used to estimate the semivariance, 
denoted A, are specified in the matrix lagmat. The matrix A provides the contrasts of the semivari¬ 
ance estimates, and its row rank is indicated by the parameter df (the degrees of freedom for the 
asymptotic distribution). The values h and kernel provide the bandwidth (smoothing) parame¬ 
ter and form of the kernel smoother, respectively, used to smooth over spatial lags when estimating 
the semivariance. If a normal smoothing kernel is used, then the truncation parameter indicates 
where to truncate the normal kernel (i.e., zero weight for values larger than this value). The param¬ 
eters xlims and ylims give the horizontal and vertical limits of the sampling region (a rectangular 
sampling region is assumed). A grid is laid over the sampling region according the width and height 
specified by grid, spacing. The dimensions of the moving windows, given in the units of the un¬ 
derlying grid, are determined by the values in window.dims. The bandwidth used to estimate the 
semivariance on the subblocks of data is indicated by subblock.h and a finite sample adjustment 
to the estimate of the asymptotic variance-covariance matrix is made by setting sig. est. finite = 
T. For more information a bout the different arguments and guidelines on how to choose them, see 
Weller and Hoeting ( 20151) . the spTest manual, and the original works. 
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4 Applications: Using spTest to check for anisotropy 


We demonstrate the functions in spTest on two data sets: the first containing gridded sampling 
locations, the second non-gridded sampling locations. For more details on the functions and more 
examples using simulated data, see the official spTest manual. The spTest package can be used 
independently of other package s built for ana l yzing spatial data , but it works nicely wi t h two other 
packages loaded into R : fields ( Nvchka et ah . 20141) and geoR ( Ribeiro Jr. and Digglel . [20011) . The 
package fields is automatical ly installed with spTest, while geoR is not. We also load the splines 
package ( R Core Teaml . [2015ll . which we use to fit mean functions. 


R> library("spTest") 
R> library("fields") 
R> library("geoR") 

R> library("splines") 


For the two examples given below, we use graphical diagnostics and the hypothesis tests imple¬ 
mented in spTest to determine whether or not an assumption of isotropy is reasonable for spatially- 
referenced data. The general strategy will be to first do exploratory data analysis (EDA) of the 
original data and create a model for the mean of the spatial process using appropriate covariates. 
After estimating a model for the mean, we extract residuals and again use EDA to check for remain¬ 
ing spatial dependence and utilize graphical diagnostics and hypothesis tests to investigate potential 
anisotropy. For brevity, we have not included the full version of EDA code and plots; instead, we in¬ 
clude only the most relevant to demonstrating the functionality of the spTest package. The complete 
version of the code is available on github. 


4.1 Gridded sampling locations 


The gridded data used in this sect ion come from the North American Regional Climate Change 
Assessment Program [NARCCAP] ( Mearns et al. . 20091) . The data set WRFG in spTest includes co¬ 
ordinates and a 24-year average of yearly average temperatures from runs of the Weather Research 
and Forecasting - Grell configuration (WRFG) regional climate model (RCM) using boundary con¬ 
ditions from the National Centers for Environmental Prediction (NCEP). The original data and the 
R code used to create the yearly averages are available online. The data set contains both latitude 
and longitude coordinates and map projection coordinates that specify the regular grid for 14,606 
grid boxes along with average temperature at each grid box. We can display a heat map of all of the 
data using the image.plot function from the fields package (see Figure ITal) . Due to computational 
considerations and because the methods in spTest assume stationarity, for our analysis we use a 
20 X 20 subset of the grid boxes over the central United States (see Figure ITbl) . 


R> data("WRFG") 

R> image.plot(WRFG$lon - 360, WRFG$lat, WRFG$WRFG.NCEP.tas, 

+ col = two.colors(n = 256, start = "blueS", end = "red3", 

+ middle = "gray60"), legend.lab = "Temp (K)", 

+ legend.cex = 0.8, legend.line = 2.2, xlab = "Longitude", 

+ yla-b = "Latitude", main = "Mean WRFG-NCEP Temperatures") 

R> world(add = T) 

R> coords <- expand.grid(WRFG$xc,WRFG$yc) 

R> sub <- which(coords [, 1] > 2.95e6 & coords[,1] < 4.0e6 
+ & coords[,2] > 1.2e6 & coords[,2] < 2.25e6) 

R> image.plot(WRFG$xc, WRFG$yc, WRFG$WRFG.NCEP.tas, 

+ col = two.colors(n = 256, start = "blue3", end = "red3", 
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Figure 1: Heat maps of average temperatures of WRFG-NCEP output from NARCCAP. Figure [Tal 
displays the average of WRFG-NCEP yearly average temperature over 24 years with coordinates 
in latitude and longitude. Pigure [TQ displays the same data but on the gridded map projection 
(easting/northing) coordinates, and the black points indicate the subset of the data used for this 
example. 


+ middle = "gra.y60"), legend, lab = "Temp (K) ", legend, cex = 0.8, 

+ legend.line = 2.2, xlab = "Easting", ylab = "Northing", 

+ main = "Map Projection Coordinates") 

R> points(coords [sub,], pch = 16, cex = 0.35) 

R> my.coords <- coords[sub,] 

R> colnames (my. coords) <- cC'xc", "yc") 

R> tas <- WRFG$WRFG.NCEP.tas 
R> tas <- c(tas) 

R> tas <- tas[sub] 

R> mydata <- cbind(my.coords[,1],my.coords[,2], tas) 

To investigate potential anisotropy in the relevant subset of these data, we can examine two 
graphical diagnostics: a heat map and directional sample semivariograms. We use the function 
variog4 from the geoR package to estimate directional semivariograms to visually assess isotropy 
properties. 

R> X.coord <- unique(my.coords[, 1]) 

R> y.coord <- unique(my.coords [, 2]) 

R> nx <- length(x.coord) 

R> ny <- length(y.coord) 

R> tas.mat <- matrix(mydata[, 3], nrow = nx, ncol = ny, 

+ byrow = F) 

R> image.plot(x.coord, y.coord, tas.mat, col = two.colors(n = 256, 

+ start = "blue3", end = "red3", middle = "gray60"), 

+ legend.lab = "Temp (K)", legend.cex = 0.8, legend.line = 2.2, 

+ yl3.b = "Northing", xlab = "Easting", main = "Subset of Temperatures") 


R> geodat <- as.geodata(mydata) 
R> svar4 <- variog4(geodat) 
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Figure 2: Graphical assessments of isotropy in the 20 x 20 subset of WRFG temperature data. 
Because northing (latitude) coordinates have not been accounted for, the heat map (123 indicates 
that the dependence between observations is stronger in the east-west direction than the north-south 
direction. The directional dependence is also evidenced by the differences between the directional 
sample semivariograms (I2b|) . 


R> plot(svar4) 

R> titleC'Directional Sample Semivariograms") 


The heat map in Figure [2a] indicates that the spatial process is anisotropic, having a stronger 
spatial dependence in the horizontal direction than the vertical direction. Intuitively, northing 
(latitude) is an important factor in determining average temperature, and we need to include its 
effect in a model for these data. We also notice non-linear trends in temperature as a function of 
easting (longitude) in Figure [2^ Thus, the anisotropy can be attributed, at least in part, to the 
fact that we have not modeled important covariates related to the process. The directional sample 
semivariograms in Figure reaffirm the notion that the data exhibit anisotropy as the 90° sample 
semivariogram appears much different than the other three. Before modeling the effects of northing 
and easting coordinates, we use the GuanTestGrid function from spTest to affirm our understanding 
that these data exhibit anisotropy. 

While some of the nonparametric methods for testing isotropy do not require Gaussian data, 
necessary conditions for the asymptotic properties of the test to hold are typically met when the 
data are Gaussian. A histogram (not shown) of the relevant subset of WRFG temperatu res indicates 
that a Gaussian assumption is reasonable. To implement the nonparametric test in iGuan et al 


(12004!) via the function GuanTestGrid, we need to specify the spatial lags that will be used to test 
for differences in the semivariogram. For this test we choose the lag set 


A = {hi = (1, 0),h2 = (0, 1), h3 = (1, l),h4 = (-1, 1)}, 


and we use the matrix A in Equation 12.61 to contrast the semivariogram estimates. Because the 
distance between sampling locations is 50,000, we set the the scaling parameter delta = 50,000. 
To create subblocks of data used to estimate S, we choose a moving window with size 4x4. The 
moving window dimensions should be chosen so that the window has the same shape (i.e., square 
or rectangle) and orientation as the sampling domain. To maximize the amount of data used to 
estimate S, the dimensions of the window should evenly divide the number of columns and rows. 
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respectively, of the entire region. The window dimensions should also be compatible with the spatial 
lags in A. For example, a window with dimensions of 2 x 2 cannot be used to estimate the variability 
of the semivariance at a lag with Euclidean distance longer than 1 / 2 , the maximum distance between 
locations in the moving window. For this example there are 20 rows and columns and we are using 
lags with spacings of one or two sampling locations; hence, window dimensions of 2 x 2 or 4 x 4 are 
reasonable choices. We run the test via the following code. 

R> my.delta <- 50000 

R> mylags <- rbind(c(l, 0), c(0, 1), c(l, 1), c(-l, 1)) 

R> myA <- rbind(c(1, -1, 0, 0), c(0, 0, 1, -1)) 

R> tr <- GuanTestGridCspdata = mydata, delta = my.delta, 

+ lagmat = mylags, A = myA, window.dims = c(4, 4), 

+ pt.est.edge = TRUE, sig.est.edge = TRUE, sig.est.finite = TRUE, 

+ df = 2) 

R> tr$alternative <- NULL 
R> tr 


Test of isotropy from Guan et. al. (2004) for gridded 
Scunpling locations using the Scunple semivariogrEun. 


data: mydata 

Chi-sq = 34.433, df = 2, p-value = 3.335e-08 

p-value (finite adj.) < 2.2e-16, number of subblocks: 240 

saunple estimates: (lag value) 

( 1 , 0 ) ( 0 , 1 ) ( 1 , 1 ) (- 1 , 1 ) 

0.03055723 0.08171415 0.10336776 0.10902089 


estimated asymp. variaince-covariance matrix: 

[,1] [,2] [,3] [,4] 
[1,] 0.009229206 0.005124418 0.002365263 0.01657042 
[2,] 0.005124418 0.032159967 0.016811371 0.04730438 
[3,] 0.002365263 0.016811371 0.060613653 -0.01891585 
[4,] 0.016570423 0.047304376 -0.018915852 0.12822978 


As expected, the results of the hypot hesis test (p value < 0.05) indicate that the data exhibit 
anisotropy. We note that for gridded data. iGuan et al. ( 2004 ) recommend using the p value computed 
via a finite sample correction. The function GuanTestGrid, and other functions in spTest, return a 
p value(s) for the test and information used in computing the p value, such as the point estimates 


(G„), estimates of the asymptotic variance-covariance matrix (S), the number of subblocks used 
to estimate S, and other information about the estimation process. Here we note that the point 
estimates for the directional semivariance are slightly different between the functions from the spTest 
and geoR packages due to different kernel methods used in estimation. 

As mentioned earlier, we need to model the effects of northing and easting (latitude and longitude) 
coordinates on average temperature. We fit temperature as a nonparametric additive function of 
both the northing and easting coordinates via least-squares using cubic splines. The cubic splines 
can be specihed using the function ns from the splines package and the least squares fit is computed 
via the Im function. 


R> xl <- my.coords [, 1] 

R> x2 <- my.coords [, 2] 

R> ml <- ImCtas ~ ns(xl, df = 3) + ns(x2, df = 3)) 
R> summary(ml) 









Heat Map of Residuals 
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Figure 3: Graphical assessments of isotropy in the residuals. The appearance of elongated areas 
of similar residual values in the heat map (1 3 all indicates that the process may be anisotropic. The 
directional semivariograms (03 do not appear to exhibit differences, indicating that the process is 
isotropic. 


After removing the mean effects of the coordinates, we can check for any remaining (unaccounted 
for) spatial dependence and evidence of anisotropy in the residuals using graphical diagnostics and 
a hypothesis test. A histogram of the residuals (not shown) indicates that a Gaussian assumption 
is reasonable. 

R> resid <- ml$resid 

R> resid.mat <- matrixfresid, nrow = nx, ncol = ny, byrow = F) 

R> image.plot(x.coord, y.coord, resid.mat, col = two.colors(n = 256, 

+ start = "blueS", end = "red3", middle = "graySO"), 

+ xlab = "Easting", ylab = "Northing") 

R> title("Heat Map of Residuals") 

R> resid.dat <- cbind(mydata[, 1:2], resid) 

R> geodat .resid <- as. geodata ("resid.dat) 

R> plot(variog4(geodat.resid)) 

R> title("Directional Sample Semivariograms") 

The clusters of similar values in the heat map of Figure [3al and the increase, followed by a leveling 
off, of the semivariance values as distance increases in the directional sample semivariograms in 
Figure [3bl indicate that the residuals are still spatially dependent. However, the plots in Figure 0] do 
not clearly illustrate whether or not the residuals exhibit anisotropy. There appears to be directional 
dependence along the NW to SE direction in the northern parts of the heatmap (Figure 03 . The 
directional sample semivariograms do not appear to be different until the distance is greater than 
200,000. Semivariogram estimates at large distances can be unreliable because there are fewer 
pairs of sampling locations to estimate this value than at short distances. Likewise, directional 
semivariograms are less reliable than a uni-directional semivariogram because fewer pairs of sampling 
locations are used at each distance for directional estimation. The unreliability of the semivariograms 
at the larger distances, coupled with the lack of a measure of uncertainty, make it difficult to 
determine whether or not an assumption of isotropy is reasonable. In hopes of gaining more insight 
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into the isotropy properties, we perform a nonparametric hypothesis test of isotropy using the 
residuals with the same choices for A, A, and the window dimensions. 

R> tr <- GuanTestGrid(spdata = resid.dat, delta = my.delta, 

+ lagmat = mylags, A = myA, window.dims = c(2, 2), 

+ pt.est.edge = TRUE, sig.est.edge = TRUE, sig.est.finite = TRUE, 

+ df = 2) 

R> tr$p.value.finite 

p.value.finite 
0.4281046 

Here the residuals do not provide evidence for anisotropy {p value > 0.05). These results sug¬ 
gest that it may be appropriate to choose an isotropic covariance function to model the residuals. 
However, it is important to note that we have not included the effect of other potentially influential 
covariates such as elevation or water cover in the model for temperature. Additionally, although we 
examined a 20 x 20 subset of the data, the grid boxes still cover a large geographic region of the 
U.S., and thus an assumption of stationarity, which is needed for the asymptotic properties of the 
hypothesis test to hold, may not be reasonable. 

4.2 Non-gridded sampling locations 

The non-gridded data set used in this section describes monthly surface meterology for the state 
of Colorado and comes from the National Center for Atmospheric Research (NCAR). The data are 
available in the package fields. For this example, our variable of interest is the log of the 30-year 
average of average yearly precipitation at 344 station locations during the time period 1968-1997. 

Like the temperature data, our goal will be to model the mean effect of covariates and check for 
spatial dependence and potential anisotropy in the residuals. Because the sampling locations cover 
a much smaller region than the subset of WRFG temperatures, we choose to use the latitude and 
longitude coordinates for this example. We can create a heat map of the log precipitation values 
and the elevation of the stations using the function quilt .plot from the fields package. 

R> dataC'COmontblyMet") 

R> sub30 <- CO.ppt[74:103, , ] 

R> nstations <- 376 
R> years <- 1968:1997 
R> nyears <- length(years) 

R> yr.avg <- matrix(data = NA, nrow = nstations, ncol = nyears) 

R> for (i in 1 -.nyears) { 

+ yr.dat <- sub30[i, , ] 

+ yr.avg[, i] <- apply(yr.dat, 2, mean, na.rm = T) 

+ } 

R> avg30 <- apply(yr.avg, 1, mean, na.rm = T) 

R> quilt.plot(CO.loc, log(avg30), col = two.colors(n = 256, 

+ start = "blue3", end = "red3", middle = "gray60"), 

+ legend.lab = "Precip (log mm)", legend.cex = 0.8, 

+ legend.line = 2.2, xlab = "Longitude", ylab = "Latitude", 

+ main = "Quilt Plot of log(precip)") 

R> US(add = T) 

R> quilt.plot(CO.loc, CO.elev, col = two.colors(n = 256, 

+ start = "blue3", end = "red3", middle = "gray60"), 
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Quilt Plot of log(precip) 


Quilt Plot of Elevation 



(a) 



(b) 


Figure 4: Heat maps showing the locations of the stations in a region of Colorado along with the 
log of average precipitation (l4all and elevation (libll at each station. 


+ legend.lab = "Elevation (meters)", legend.cex = 0.8, 

+ legend.line = 2.7, xlab = "Longitude", ylab = "Latitude", 

+ main = "Quilt Plot of Elevation") 

R> US(add = T) 

Colorado has two distinct geographic regions: the mountainous region in the west and the plains 
region in the east. Figure l4bl illustrates these two regions, and we can begin to notice a possible 
relationship between elevation and average precipitation. We explore the potential relationship 
between precipitation and elevation using scatter plots (see Figure [53 . 

R> plot(CO.elev, log(avg30), xlab = "Elevation", ylab = "log(precip)", 

+ main = "Scatter of log(precip) vs. Elevation") 

R> ml <- lm(log(avg30) ~ ns(CO.elev, df = 3)) 

R> summary(ml) 

R> fits <- ml$fitted.values 
R> bad <- is.na(avg30) 

R> X <- CO.elev[which(!bad)] 

R> ord <- order(x) 

R> X <- sort(x) 

R> fits <- fits[ord] 

R> lines(x, fits, Iwd = 3, col = "red") 

R> resid <- ml$resid 

R> hist(resid, freq = F, main = "Histogram of Residuals") 

R> lines(density(resid), Iwd = 2) 

We fit a cubic smoothing spline via least squares to model the relationship between log (precipitation) 
and elevation. The estimate is shown in Figure [5al and the histogram of residuals in Figure [5b] indi¬ 
cates that a Gaussian assumption is reasonable. We will use the residuals from this model to check 
for remaining spatial dependence and potential anisotropy. We use variog4 to estimate directional 
sample semivariograms. 
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(a) 


(b) 


Figure 5: Results from the model relating log(precipitation) and elevation. 


R> precip.resid <- cbind(CO.loc[which(!bad), ][, 1] , CO.loc[which(!bad), 
+ ][, 2], resid) 

R> geodat <- as.geodata(precip.resid) 

R> svar4 <- variog4(geodat) 

R> plot(svar4, legend = F) 

R> legendC'bottomright", legend = c(expression(0 * degree), 

+ expression("45 * degree), expression(90 * degree), 

+ expression(135 * degree)), col = 1:4, Ity = 1:4) 

R> title("Directional Sample Semivariograms") 


The increase, followed by a leveling off, of the semivariance values as distance increases in FigurelHl 
indicates that there is spatial dependence remaining in the data. We notice that the 0° semivariogram 
appears to be slightly different than the other three, but there is no attached measure of uncertainty. 
We again turn to a nonparametric hypothesis test of isotropy to assist in determining if an assumption 
of isotropy is reasonable. 

There are two testi ng procedures for te s ting i sotropy on non-gridded data available in spTest: 
Guan et al. ( 2004 1 and Maitv and Sherman ( 2012l l. To choose between these two, we need to decide 


whether or not it is reasonable to assume that sampling locations are unifo rmly distributed on the 
sampling domain. The methods for non-gridded data from lGua n et al. (l2004[l rely on th e assumption 
that sampling locations are uniformly distributed while the Maitv and ShermanI ( 2012 1 can be used 
on any general sampling design. To check this assumption, we can turn to methods from the spatial 
point process literature to perform a test of complete spatial randomness (GSR) (i.e., a uniform 
spatial distributio n) for the sampling location s. Methods for testing GSR are available in the R 


package spatstat (jBaddelev and Turned . 120041) . For brevity, we do not include those results here, 


and we will proceed assuming t hat GSR holds for these s a mplin g locations. 

For both Guan et al. ( 2004 1 and iMaitv and ShermanI ( 2012ll . we need to choose the lag set, A, 
and the contrast matrix, A. The large jumps and decrease in the semivariance values in Figure [5] 
indicate that semivariogram estimates become unreliable beyond a distance of 2; thus, we should 
choose lags having Euclidean distance less than this distance. We choose the lag set 


A = {hi = (3/4,0), h2 = (0,3/4), ha = (3/4,3/4), hi 


and again we use the matrix A in Eauation l2.6l 


(-3/4,3/4)}, 
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Directional Sample Semivariograms 
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Figure 6: A graphical assessment of isotropy in the residuals from the model relating 
log (precipitation) to elevation. 


R> mylags <- 0.75 * rbind(c(l,0), c(0,l), c(l,l), c(-l,l)) 
R> myA <- rbind(c(l, -1, 0 , 0), c(0, 0, 1, -1)) 


The next step to implement the methods from iGnan et al. ( 2004h and iMaitv and ShermanI (2012) 
is to determine the size of the moving windows and the block size, respectively, that will be used 
in estimating the asymptotic variance-covariance matrix, S. The moving window is shifted over 
the sampling region , creat ing subblocks of data used to estimate S. Likewise, for the test in 
iMaitv and Sherm an (12012), th e block size is used to implement the grid-based block bootstrap 
[GBBB] (iLahiri and Zhul . 2006li . 

There are two steps in making this determination. First, we place a grid over the sampling 
domain; second, we specify scaling parameters that will define the window/block size in terms of 
that grid. This necessary two step process for non-gridded locations should be completed keeping 
three goals in mind: (1) the number of samplin g locations per window/block, denoted rif,, should be 
approximately ^/n I Weller and Hoetind . 201 !t[ ): (2) the windows/blocks should have have the same 
orientation (i.e., square or rectangle) as the entire sampling domain; and (3) the scaling parameters 
should be compatible with the underlying grid. 

For the Colorado precipitation data, the dimensions of the sampling region are approximately 
8.5° X 5° (width x height), providing a total area of 42.5. For n = 344 uniformly distributed sampling 
locations, we expect approximately 344/42.5 = 8.09 sampling locations per unit area. Recalling goal 
(1), we seek to construct windows/blocks with rib ~ \/MA = 18.5, or equivalently, windows/blocks 
with an area of approximately 18.5/8.09 = 2.29. Goal (2) indicates we want to create rectangular 
windows/blocks with slightly larger width than height, and (3) says that if our grid divides the 
x-axis into 12 pieces, then the scaling parameter defining the width of the window/block should 
be 3 or 4 because those numbers evenly divide 12. For the CO precipitation data, if we choose 
our grid to divide the cc-axis into 16 pieces and the y-axis into 12 pieces, we have a grid with 
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{x,y) spacing of roughly (8.5/16,5/12) r; (0.53°, 0.42°). The resulting grid is plotted in Figure [T] 
Then, choosing our scaling parameters to be 4 x 3, we have windows/blocks with dimensions of 
approximately (4)(0.53) x (3)(0.42) = 2.12° x 1.23° and area of (2.12)(1.23) r: 2.61, or equivalently 
with Hb = (2.61)(8.09) r: 21. 

R> quilt.plot(precip.resid[, 1:2], precip.resid[, 3], col = two.colors(n = 256, 

+ start = "blue3", end = "red3", middle = "gray60"), 

+ xlab = "Longitude", ylab = "Latitude") 

R> titleC'Quilt Plot of Residuals and Grid Used for Subsampling") 

R> min(precip.resid[, 1]) 

[1] -109.483 

R> max(precip.resid[, 1]) 

[ 1 ] - 101.02 

R> min(precip.resid[, 2]) 

[1] 36.512 

R> max(precip.resid[, 2]) 

[1] 41.467 


R> my.xlims <- c(-109.5, -101) 

R> my.ylims <- c(36.5, 41.5) 

R> xlen <- my.xlims[2] - my.xlims[l] 

R> ylen <- my.ylims[2] - my.ylims [1] 

R> my.grid.spacing <- c(xlen/16, ylen/12) 

R> xgrid <- seq(my.xlims [1], my.xlims [2], by = my.grid.spacing[1]) 
R> ygrid <- seq(my.ylims[1], my.ylims[2], by = my.grid.spacing[2]) 
R> abline(v = xgrid, Ity = 2) 

R> abline(b = ygrid, Ity = 2) 


For the functions GuanTestUnif and MaityTest, the upper and lower limits of the sampling 
region in the x and y directions are given by the xlims and ylims arguments. Note that the values 
dehning the upper and lower limits should be slightly larger than the minimum and maximum 
observed x and y coordinates. Additionally, the spacing of the grid laid on the sampling region 
is defined by the grid, spacing argument, and the scaling parameters that define the size of the 
moving window and blocks in terms of the underlying grid are given by the window. dims and 
block.dims arguments, respectively. We recommend using visualizations of different grid choices 
and algebraic calculations, as done above, to assist in choosing a grid and window/block dimensions. 
When the scaling parameters defining the moving window or block dimensions are not compatible 
with the number of rows or columns of gridded sampling locations or the dimensions of the grid 
laid on the sampling region for non-gridded locations, the functions in spTest will print a warning 
message because they do not currently handle partial (incomplete) blocks. Likewise, if the window or 
block dimensions are too small for non-gridded sampling locations, the functions GuainTestUnif and 
MaityTest will discard (sub)blocks that do not have enough sampling locations and print a warning 
message. The p value of the hypothesis test wil l be sensitive to the choice of moving window and 
block dimensions. See lWeller and Hoetina ( 2015 ) and the original works for more recommendations 
on choosing these values. 

The next step for implementing the test in Guan et al.l ( 20041) is choosing the smoothing (band¬ 
width) parameters for smoothing over lags on the entire domain and within each subblock created 
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Quilt Plot of Residuals and Grid Used for Subsampling 



Figure 7: Because the sampling locations are not gridded, it can be difficult to assess isotropy 
properties via a heat map. 


by the moving windows. The smoothing parameters should be chosen based on the number and 
density of the sampling locations with larger values of the smoothing parameter inducing higher lev¬ 
els of smoothing, i.e., averaging over all directions. In our experience, smoothing parameter values 
between 0.6 and 0.9 tend to produce reasonable results when using a Gaussian smoothing kernel 
truncated at 1.5. However, the p value of the hypothesis test will change with the bandwidth. For 
this example, we choose a bandwidth of h = 0.7 for smoothing over lags on the entire domain and 
on the subblocks (sb.h) of data created by the moving window. We also use the default Gaussian 
smoothing kernel (kernel = "norm") truncated at 1. 5 (truncation = 1.5). 

Finally, for the test in iMaitv and ShermanI ( 2012h we need to choose the number of bootstrap 
resamples that will be used in the GBBB procedure to estimate S. We recommend using at least 50 
bootstrap samples; however, the bootstrapping procedure is computationally intensive. We choose 
nBoot = 100 bootstrap samples for our example, and we note that the number of bootstraps does 
not affect the precision of the p value, which is computed via the asymptotic distribution. Having 
determined values for the different options, we can now perform the hypothesis tests. 


R> myh <-0.7 
R> myh.sb <- 0.7 

R> tr.guan <- GuanTestUnif(spdata = precip.resid, lagmat = mylags, 

+ A = myA, df = 2, h = myh, kernel = "norm", truncation = 1.5, 

+ xlims = my.xlims, ylims = my.ylims, grid.spacing = my.grid.spacing, 

+ window.dims = c(4, 3), subblock.h = myh.sb) 

R> tr.guan$p.value 


p.value.chisq 
0.3608579 


R> tr.maity <- MaityTest(spdata = precip.resid, lagmat = mylags, 
+ A = myA, df = 2, xlims = my.xlims, ylims = my.ylims. 
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+ grid.spacing = my.grid.spacing, block.dims = c(4, 

+ 3), nBoot = 100) 

R> tr.maity$p.value 

p.value.chisq 
0.2122808 

For both tests, the residuals do not provide evidence in favor of anisotropy. Thus, in developing 
a spatial model for the residuals, it may be reasonable to assume isotropy. 


5 Discussion 


Choosing a covariance function is an important step in modeling spatially-referenced data and a va¬ 
riety of choices for the covariance function are available (e.g., anisotropy, nonstationarity, parametric 
forms). The R package spTest implements several nonparametric tests for checking isotropy prop¬ 
erties which avoid specifying a parametric form for the covariance function. One concern regarding 
the methods in spTest is that they tend to have low power when the anisotropy is weak (see, e.g., 
Weller and Hoetin3 . l2015ll . 

After determining whether or not an assumption of isotro py is reasonable, we c an ch oose a para¬ 
metric or nonparametric model for the covariance function. Weller and Hoetina (201^ include an 
illustration of the process of determining and modeling isotropy properties and how nonparametric 
tests of isotropy can be used in this process. We have demonstrated how graphical techniques and the 
functions in spTest can be used in a complementary role to check for anisotropy. Future work includes 
extending the functionality of spTest to allow implementing the tests on non-rectangular sampling 
domains. Additionally, computational efficiency can be improved by programming functions in C-|—[-. 
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